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Application/Control Number: 10/528,326 
Art Unit: 1633 



RESULT 1 

US- 10-487-078-25 

; Sequence 25, Application US/10487078 
; Publication No. US20050064543A1 
; GENERAL INFORMATION: 

; APPLICANT: 1NCYTE CORPORATION; TANG et al 

; TITLE OF INVENTION: SECRETED PROTEINS 

; FILE REFERENCE: PF-1 141 USN 

; CURRENT APPLICATION NUMBER: US/10/487,078 

; CURRENT FILING DATE: 2004-02-18 

; PRIOR APPLICATION NUMBER: PCT/US02/27143 

; PRIOR FILING DATE: 2002-08-15 

; PRIOR APPLICATION NUMBER: US 60/313,249 

; PRIOR FILING DATE: 2001-08-17 

; PRIOR APPLICATION NUMBER: US 60/3 14,752 

; PRIOR FILING DATE: 2001-08-24 

; PRIOR APPLICATION NUMBER: US 60/317,818 

; PRIOR FILING DATE: 2001-09-07 
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; PRIOR APPLICATION NUMBER: US 60/324,040 

; PRIOR FILING DATE: 2001-09-21 
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; PRIOR APPLICATION NUMBER: US 60/334,229 
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; PRIOR FILING DATE: 2002-02-13 
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LENGTH: 270 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/KEY: miscjeature 

OTHER INFORMATION: Incyte ID No: 8001939CD1 
US- 10-487-078-25 

Query Match 100.0%; Score 1450; DB 5; Length 270; 

Best Local Similarity 100.0%; Pred. No. 7.4e-121 ; 

Matches 270; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
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Deciphering the Message in Protein Sequence 
Tolerance to Amino Acid Substitutions 

James U. Bowie,* John F. Rjeidhaar-Olson, Wendell A. Lim, 

Robert T. Sauer 



An amino acid sequence encodes a message that deter- 
mines the shape and function of a prorein. This message is 
highly degenerate in that many different sequences can 
code for proteins with essentially the same structure and 
activity. Comparison of different sequences with similar 
messages cm reveal key features of the code and improve 
understanding of how a protein folds and how tt per- 
forms its function. , 



THK <:;knomk is manifest laroku in thk set of pro- 
rcins mar it encodes. ]r is the Ability of these proteins to fold 
into unique three -dimensional structures that allows them to 
runerion and carry our die instructions of the genome. Thus, 
comprehending die rules that relate ami no acid sequence to struc- 
ture is fundamental to an understanding of biological processes. 
Because an amino acid sequence contains all of the information 
necessary to determine the structure, of a protein (J), ir should be 
possible to predict structure from sequence, and subsequently ro 
infer detailed aspects of function from the structure. However,- both 
problems arc extremely complex, and it seems unlikely that cither 
will be solved in an exact manner in the near future, k may be 
possible to obtain approximate solutions by using experimental data 
co simplify the problem. In this article, we describe how an analysis 
of allowed amino acid substitutions in proteins can be. used to 
reduce rhc complexity of sequences and reveal important aspens of 
structure and function. 



Methods for Studying Tolerance to 
Sequence Variation 

There ate two main approaches to studying the tolerance of an 
amino acid sequence to change. The first method relies on the 
process of evolution, in which mutations arc either accepted or 
rejected by natural selection. This method has been extremely 
powerful for proteins such as the globius or cytochrontcs, for which 
sequences from many different specie* are known (2-7). The second 
approach uses genetic methods to introduce amino acid change* at 



The jurhou are in the IVfummcru of Riotogv. MasAAchuseiis liwiitite ot Technology. 
Cambridge, MA 02139. 



specific positions in a cloned gciK and uses selections or sci 
identify functional .sequences, litis approach has been used 
advantage far proteins that can be expressed in bacteria c 
where the appropriate, genetic manipulations are possible (.? 
The end results of both mcthovls .in: lists of active sequences 
be compared and analyzed ro identity sequence features i 
essential for folding or function. If a parrirular property a 
chain, such as charge or size, is important at a given positit 
side chains that have the required property u r ill be allowc< 
vcr&cly, if the chemical identity of the side chain is .umni| 
then many different substitutions will be permitted. 

Snidics in which rhese methods were used have reveal 
proteins arc surprisingly tolerant of amino acid substirutiot 
11). For example, in studying the ctfeccs of approximate! 
single amino acid substitutions at 142 positions hi' hi rc 
Miller and co-workers found that about one-half of all subst 
were phciustypically silent (I/). At some positions, many d 
nonconscrvative substitutions were allowed Such residue p 
play little or no role in structure and function. At oilier posit 
substitutions or only conservative substitutions were allowc< 
residues arc the most important for lac repressor activity. 

What roles do invariant and conserved side chains 
proteins? Residues that arc directly involved in protein fi 
such as biivding or catalysis will certainly be among rr 
conserved. For example, replacing the Asp in the catalytic 
trypsin with Asn results in a lO'^fold reduction in activity 
similar loss of activity occurs in X repressor when a DNA 
residue is changed from Asn to Asp (f.*). To earn* oi 
function, however, these catalytic residues ajid binding 
must be precisely oriented in three dimensions. C!onSc« 
mutations in residues that are required for structure form, 
stability can also have dramatic, effects on activity ( 10. 
Hence, many of the residues that .ire conserved in sets oi 
sequences play structural rule*. 



Substitutions at Surface and Buried Posit 

In their initial comparisons of the gtobin sequences. Pet 
co-workers found that most buried residues require nonpi 
chains, whereas few features of surface side chains are g 
conserved (6). Similar results have, been seen for a number of 
families (2, -4, S, ?, /7, 7tf). An example of the sequence tote 
surface versus buried sires tan be seen in Fig. 1, which sh- 
allowed substitutions in X repressor at residue positions that 
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Several Epitopes of p85 Glycoprotein 
(CDw44) are Dependent on Intact 
Disulphide Bonds. Isolation of cDNA Clones 
Requires a Polyclonal Antibody Raised 
Against the Reduced Protein 

Ian Rogers, 1 Giacomo D'Agostaro, 2 Sonia Vera 1 and 
Michelle Letarte 14 

Received April 22, 1988 



Monoclonal antibodies S0B4 and 50E6 recognize two distinct epitopes of human p85 glycoprotein 
(CDw44). Both epitopes are destroyed by reduction of the purified glycoprotein as demonstrated by 
inhibition of cellular radioimmunoassay and Western blot analysis. Endoglycosidase F treated p8S 
glycoprotein, with an apparent molecular weight of 73,000, Is still reactive with both monoclonal 
antibodies. Thus both epitopes are conformational determinants of the polypeptide chain, A rabbit 
antibody produced against purified native p85 glycoprotein also reacted only with the non-reduced 
form of pB5. Repeated immunizations with SD&Jissociated and. reduced p8S yielded a polyclonal 
antibody reactive by Western blot analysis with reduced and non-reduced forms of p85 glycoprotein. 
When a HOON leukemia cell line cDNA expression library was screened with this polyclonal 
antibody, two cDNA clones were isolated which reacted specifically with the antiserum and not with 
the control non-immune serum. Preliminary characterization of these clones indicates that they are 
p85-related. 



KEYWORDS: CDw44; conformational epitopes; lymphocyte antigen. 

INTRODUCTION 

Human p85 glycoprotein was first identified with MAb F10-44-2 as a brain- 
leukocyte antigen (1, 2)* This antibody included in the Third International 
Workshop on human leukocyte differentiation antigens defined a new cluster, 
CDw44 (3). Several MAb to the p85 glycoprotein were obtained by immunization 
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Fig. 1. (A) Amino acid Substitutions allowed in a 
short region of A repressor. The wild-type se- 
qiK.na- is sl»wn ;ik)ug the cento Ibic. live al- 
lowed substitution* shown above each pmirion 
were identified by randomly mutating one "to 
three codons at a rime by using a cassette mcdiod 
and applying a functional ickction (9). (B) The 
Ibtttioittl solvent accessibility (42) of the. wild- 
rypc side chain in the protein dimcr {43) relative 
to i he same atoms in an Ah- X- A la model tripep- 
tidc. 
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selection after cassette mm agenesis. A histogram of .side chain 
solvent accessibility in rtic crystal structure of rhe. dimcr is also 
shown in Fig. I. At six positions, only the wild-type residue or 
relatively conservative subsi itut ions are allowed. Five of these 
positions arc buried in the protein. In contrast, mosr of rhe highly 
exposed positions tolerate a wide range of chemically different side 
chains, including hydrophilic and hydrophobic residues. 1 knee, it 
seems that most of die structural information in this region of the 
protein is carried by the residues that are solvent inaccessible. 



Constraints on Core Sequences 

Because core residue positions appear to be extremely important 
for protein folding or stability, we must understand the factors that 
dictaic whether a given core sequence will be acccpniblc. In general, 
only hydrophobic or neutral residues are tolerated at buried sices in 
proteins, undoubtedly because of the large favorable contribut ion of 
live hydrophobic ctfect to protein stability (19). For example, Fig. 2 
shows the results of genetic studies used to investigate the substitu- 
tions allowed at residue positions that form the hydrophobic core of 
the NH 2 -temiluaJ domain of X repressor {20). The acceptable core 
sequences arc composed almost exclusively of Ah, Cys s Thr, Val, He, 
Leu, Met, and Fhc. "Hie acceptability of many different residues at 
each core position presumably rclkcLS the fact drat dtc hydrophobic 
effect, unlike hydrogen bonding, does not depend on specific 
residue pairings. Although it is possible to imagine a hypothetical 
core sintccure Uiat is stabilized exclusively by residues forming 
hydrogen bonds and salt bridges, such a core would probably be 
difficult ro construct because hydrogen bonds require pairing of 
donors and acceptors in an exact geometry. Thus die repertoire of 
possible structures dial use a polar core would probably be extreme- 
ly limited (27}. Polar and charged residues ate occasionally found in 
the cores of proteins, but only at positions where dtcir hydrogen 
bonding needs can be satisfied (22). 

The cores of most proteins arc quite closely packed (2.?), but some 
volume changes arc acceptable. In X repressor, the overall core 
volume of acceptable sequences can vary by about 10%. Changes at 
individual sites, however , can be considerably larger. For example, 
as shown in Fig. 2, bodi Phc and Ala arc allowed at the same core 



phylogenctic studies, where it has been noted that the size c 
and increases at interacting residues arc not necessarily rcJ; 
simple complementary fashion [5, 7, 17). Rather, local 
changes are accommodated by conformational changes ii 
side chains and by a variety of backbone movements. 



The Informational Importance of the Co: 

With occasional exceptions, the core must remain hydi 
and maintain a reasonable packing density. However, since 
is composed of side chains that can assume only a limited in 
conformations (24), efficient packing must be maintained 
st eric clashes. How important are hydrophobiciry. volu 
steric complementarity in determining whether a given sequ 
form an acceptable core? Each factor is essential in a physic 
as a stable core is probably unable to tolerate unsatisfied h 
bonding groups, large holes, or stearic overlaps \ 2:>), \ lowcv 
informational sense, these factors arc not equivalent- For cxj 
experiments in which three core residues of X repress 
mutated .simultaneously, volume was a relatively un import a 
marional constraint because three-quarters of all possible c 
tions of the 20 naturally occurring am inn acids had volutin 
the range tolerated in the core, and yet must of t hese scqucu 
unacceptable (20) . In contrast, of the sequences that conui: 



Fig. 2, Ammo arid substitu- 
tions alkywed in the core of X 
repressor, The wild-type side 
chains arc shown pictorial!)' in 
the approximate orientation 
seen in the crystal structure 
{4j)n The .lists df allowed snb- 
srinitions at each position arc 
shown below the wild-type 
side chains. I"hcsc subatitu- 
dons were identified by ran- 
domly mutating one to four 
residues at a time by using a 
cassette method and applying 
a fuiwrtifjnal selection [20). 
Not all substitutions arc al- 
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the appropriate hydrophobic restducs, a significant fraction were 
acceptable. Hence, the hydrophobiciry of a sequence contains 
more mforrnatian about its potential acceptability in the core than 
does die total side chain volume. Stent compatibility was mtcrmeoj- 
are between volume and hydrophobieity in informational impor- 
tance. 



The Informational Importance of Surface Sites 

Wc have noted that many surface sites can tolerate a wide variety 
of side chains, including hydrophiLic and hydrophobic residues. This 
result might be taken to indicate that surface positions contain little 
structural information. However, Uaahford ti n/. t in an extensive 
analysis of globin sequences (4) y found a strong bias against large 
hydrophobic residues at many surface positions. At one level, this 
may reflect constraints imposed by protein solubility, because large 
parches of hydrophobic surface residues would presumably lead to 
aggregation. At a more fundairrcntal level, protein folding requires a 
partitioning between surface and buried positions. Consequently, to 
achieve a unique native state without significant cornpetition from 
other conformations, it may be important that some sites have a 
decided preference for exterior rather than interior positions. As a 
result, many surface sites can accept hydrophobic residues individ- 
ually, but die surface as a whole can probably tolerate only a 
modeme number of hydrophobic side chains 



Identification of Residue Roles from 
Sets of Sequences 

Often, a protein of interest is a member of a family of rebrcd 
sequences. What can wc infer from the pattern of alfawed substitu- 
tions at positions in sets of aligned sequences generated, by genetic 
or phylogcnctk methods? Residue positions diat can accept a 
number of different side chains including charged and highly polar 
residues, arc ahnos* certain to be on the protein surface. Residue 
positions that remain hydrophobic, whether variaWc or not, arc 
likely to be buried within the structure. In Fig. 3, those residue 
positions in X repressor that can accept hyarophilic side chains are 
shown in orange and those that cannot accept hydropliilie side 
chains arc shown in green. The obligate h>xirophobic positions 
define the core of the structure, whereas positions that can accept 
rrydrophilic side chaias define die surface. 

Functionally important residues should be conserved in sets of 
active sequences, but it is not possible to decide whether a side chain 
is functionally or structurally important just because it is invariant or 
conserved. To make this distinction requires an independent assay of 
protein folding. The ability of a mutant protein to maintain a scanty 
folded structure can often be measured by btoplrywcal techniques, 
by susceptibility to intraccUuiar proteolysis (26), or by binding to 
antibodies specific for die native structure (27, 28). In the latter 
cases, it is possible to screen proteins in mutated doxies for the 
ability' to fold even if these proteins arc inactive. Sets of sequences 
that allow formation of a stable structure can then be compared to 
the sets that allow both folding and function, with the active site or 
binding residues being those that are variable in die set of stable 
proteins but invariant in the set of functional proteins. The DNA- 
binding residues of Arc repressor were identified by this method (fij. 
The receptor-binding residues of human growth hormone were also 
identified by comparing the stabilities and activities of a set of 
mutant sequences (28). However, in this case, the mutants were 
generated as hybrid sequences between growth hormone arid related 
honnoncs with different binding specificities. 

1308 



Implications for Structure Prediction 

At: present, the only reliable method for predicting a low 
resolution tertiary structure of a new protein ts by identifying 
sequence similarity to a protein whose structure is already knowi 
(29, JO). However, it is often difficult to align sequences as rive teve 
of sequence sirrularity decreases, and it is sometimes impossible t< 
detect statistically significant sequence stnularity between distand) 
related proteins. Because the number of known sequences ts fai 
greater than the number of known struourcs it would be arivama 
geous to increase the reach of the available structural uiformation bj 
improving methods for detecting distant sequence relations and I'm 
subsequently aligning these sequences based on smictural principles 
In a normal homology search, the sequence database is scanned wirt 
a single test sequence, and every residue must be weighted equally 
However, some residues arc more important than others and shoulc 
be weighted accordingly. Moreover, certain regions of the proteif 
arc more likely to contain gaps than others. Both kinds of informa- 
tion can be obtained from sequence sets, and several tediniqucs hav< 




Fig. 3. Tolcrencc oppositions in the NHa-rcmiinal domain of X repressor, u 
liydrophilic aide chains. The complex {43) of die repressor dimer (Wue) am 
operator I3NA (white) ts shown. In {A) k poainom that can colmp 
hydropbiKc ode chains are shown in orange. The same side chains art showi 
in (8) without the remaining protein atoms. In (C). positions that requb 
rntf ropbobic or neutral side chains are shown in green. These sixle chains an 
shown in (0) without the remaining protein atoms. About three-fourths o 
the 92 side chains tn the NH r tciroinaJ domain ire included in both (B) ar* 
(D). The remaining pooiakms have not been tested. Data arc from ( 9, 14. 2C 
77. 44). 

SCIENCE, VOL, 24: 
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l>eeu used to combine .such information into marc appropriately 
weighted sequence searches and alignments (31). These methods 
were used to align rhc .sequences of retroviral proteases with osparrie 
proreaves, which in aim allowed construction of a ihrec-dmuiision- 
aJ nx)dcl Tor the protease of human immunodeficiency virus type I 
(29). (Comparison with the recently determined costal .structure of 
this protein revealed reasonable agreement in many areas of the 
predicted Structure (32). 

The structural information at most surface .sites is highly degener- 
ate. Except lor functionally important residues, exterior positions 
seem to be important chiefly in maintaining a reasonably polar 
surface. The in formation contained in buried residues is also 
degenerate, the main requirement being that chese residues remain 
hydrophobic, llius, at its most basic level, the key structural 
message in an ammo acid sequence may reside in its specific partem 
of hydrophobic and hydrophilic residues. This Ls meant in an 
informational .sen.se. Clearly, rhc precise structure and stability of a 
protein depends on a large number of dctaiJcd interactions. It is 
possible, however, that structural prediction at a more primitive 
level can be accomplished by concentrating on the most basic 
informational aspects of an amino acid sequence, l-'or example, 
amphipathic patterns can be extracted from aligned sets of sequences 
and used, in .some cases, to identity secondary structures. 

I f a region of secondary structure is packed against the hydropho- 
bic core, a partem of hydrophobic residues reflecting the periodicity 
of rhc secondary structure is expected ( 33, 34). These patterns can be 
obscured in individual sequences by hydrophobic residues on the 
protein surface. It is rare, however, for a surface position to remain 
hydrophobic .over the course of evolution. Consequently, the am- 
phipathic patterns expected for simple secondary structures can be 
much clearer in a ser of related sequences (6). Tins principle is 
illustrated in Fig. 4, which .shows helical hydrophobic moment plots 
for the Antcnnnpcdia homcodomain sequence (Fig. 4A) and for a 
composite sequence derived from a set of homologous homcodo* 
main proteins (Fig. 4B) {35). The hydrophobic moment tea simple, 
measure of the. degree of amphipathic character of a sequence in a 
jjiven secondary servient re (34). TIk amphipadiic character of the 
three « -helical regions in the Antennapcdia protein (36) is clearly 
revealed only by the analysis of the combined set of homcodomain 
sequences. The secondary structure of Arc repressor, a small DNA- 
binding protein, was recently predicted by a similar mcrhcx! {$) and 
confirmed by nucJear magnetic resonance studies (37). 

The specific pattern of hydrophobic and hydrophilic residues in 
an amino acid sequence must limit the number of different structures 
a given sequence can adopt' and may indeed define its overall fold. If 
this is true, then the arrangement of hydrophobic and hydrophilic 
residues should be a characteristic feature of a particular fold. Sweet 
and . Eisenberg have shown that the correlation of the pattern of 
hydrophobicity between rvvo protein sequences is a good criterion 
for their structural t elatedness [38), In addition, several studies, 
indicate that patterns of obligator)' hydrophobic positions identified 
from aligned .sequences arc distinctive features of sequences that 
adopt the same structure (4, 29, 3S, 39). Thus, the order of 
hydrophobic and hydrophilic residues in a sequence may actually be 
sufficient information to determine the basic folding pattern of a 
protein sequence. 

Although the pattern of sequence hydrophobics y may be a 
characteristic feature of a particular fold, it is nor yet clear how such 
jxtitems amid be used for prediction of structure de novo. It is 
important to understand how patterns in sequence space can be 
related to srrucrures in conformation space. Lau and Dill have 
approached this problem by studying rhc propcaies of simple 



tion is shown in Fig. 5. Residues adjacent in the scqucn 
occupy adjacent squares on the lattice, and two residues 
occupy the same space. Free energies of particular con formal 
evaluated with a single term, an attraction of H gro 
considering chains of ten residues, an exhaustive confon 
search for all 1024 possible sequences of H and V rcsid 
possible. For longer sequences only a representative fractio 
allowed sequence or conformation space could be cxplor 
significant results were as follows: (i) not all sequences can ( 
a "native" structure and only a few sequences form a uniqu 
structure; (ii) die probability tliar a sequence will adopt a 
native structure increases with chain lengrh; and (iii) tli 
states are compact* Contain a hydrophobic core surrounded 
residues, and contain significant secondary structure. A It he 
gap between these two-dimensional simulations and dVrce 
sional structures is large, the use of simple rules and s 
representations yields results similar to dxrvsc expected 
proteins. Three-dimensional lattice methods are also begir 
be developed and evaluated (41). 



Summary 

Tliere Ls more information in a set of related sequences r 
single sequence. A number of practical applications arise 
analysis of the tolerance of residue positions to change. Fii 
information permits the evaluation of a residue's important 
function and stability of a protein. This ability to iden 
essential elements of a protein sequence may improve out 
standing of the determinants of protein folding and stability 
as pro rein function. Seconds patterns of tolerance to ami 
sulwtihitions of varying hydrophilicity can help to identity 
likely to be buried in a protein structure and those likely to 
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Fig. 5. A representation of oik com- 
pin coidbmiatioo few a pa*ricn!ar' 
sequence of H and P residues on a 
r\vv>dimeiuional ■ square lattice.. 
[Adapted from (*0)» with ptrrnis* 
sion of the American Chemical Soci- 
ery] 



surf'ao: r*.isirions. Trie amphipathic pan cms that emerge can be u>cd 
to identify probable regions of secondary structure, Tlurd, incorpo 
raring; a knowledge of allowed substitutions can improve the ability 
to detect and align distantly related proteins because the essential 
residues can be given prominence in the alignment .scoring. 

As more sequences air determined, it becomes increasingly likely 
dun a protein of interest is a member of a family of" related 
sequences. If this is not the case, it is now possible to use genetic 
methods to generate lists of allowed amino acid substitutions. 
Consequently, at least in die short term, it may not be necessary to 
solve die folding problem for individual protein sequences. Instead, 
information from sequence scls could be used. Perhaps by simplify- 
ing sequence space through die identification of key residues, and by 
simplifying conformation space as in rhe larrice methods, it will be 
possible to develop algorithms to generate a limited number of trial 
structures. These trial structures could then, in turn, he evaluated by 
further experiments and more sophisticated energy calculations. 
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